A Word Stemming Algorithm for the Spanish Language
نویسندگان
چکیده
This paper describes a word stemming algorithm for the Spanish Language. Experiments in document retrieval regarding English text suggest that word stemming based on morphological analysis does not generally or consistently outperform ad-hoc hand tuned algorithms such as that proposed by Porter (Porter M., 1980). It is difficult to produce a Porter style algorithm for a Romance languages such as Spanish, however, due to the greater grammatical complexity and to the fact that inflection often causes changes to the root of words, not just to their endings (as is the case with English). In general terms the difficulty consists of producing an algorithm which can cope with the additional complexity of Spanish morphology, whilst preserving the simplicity of a Porter style algorithm. One such algorithm is presented in this paper. The algorithm combines dictionary look ups with some 300 stemming and intermediate reduction rules.
منابع مشابه
بررسی تأثیرات ریشهیابی در بازیابی اطلاعات در زبان فارسی
Using the language-specific behavior in information retrieval systems can improve the quality of the retrieved results significantly. Part of the word that remains after removing its affixes is called stem. Stemming process can be used for improving the relevancy of the results in information retrieval system. Different morphological variants of words (plural, past tense…) will be mapped into t...
متن کاملCorpus-Based Stemming using Co-occurrence of Word Variants
Stemming is used in many information retrieval (IR) systems to reduce variant word forms to common roots. It is one of the simplest applications of natural language processing to IR, and one of the most eeective in terms of user acceptance and consistent, though small, retrieval improvements. Current stemming techniques do not, however, reeect the language use in spe-ciic corpora and this can l...
متن کاملOverview of Stemming Algorithms for Indian and Non-Indian Languages
Stemming is a pre-processing step in Text Mining applications as well as a very common requirement of Natural Language processing functions. Stemming is the process for reducing inflected words to their stem. The main purpose of stemming is to reduce different grammatical forms / word forms of a word like its noun, adjective, verb, adverb etc. to its root form. Stemming is widely uses in Inform...
متن کاملHPS: High precision stemmer
Research into unsupervised ways of stemming has resulted, in the past few years, in the development of methods that are reliable and perform well. Our approach further shifts the boundaries of the state of the art by providing more accurate stemming results. The idea of the approach consists in building a stemmer in two stages. In the first stage, a stemming algorithm based upon clustering, whi...
متن کاملA Survey of Common Stemming Techniques and Existing Stemmers for Indian Languages
Stemming is an operation that relates morphological variants of a word. The purpose of stemming is to obtain the stem or radix of those words which are not found in dictionary. If stemmed word is present in dictionary, then that is a genuine word, otherwise it may be proper name or some invalid word. Stemming is the process for reducing inflected or sometimes derived words to their stem, base o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000